AN AUGMENTED LAGRANGIAN METHOD FOR CONIC CONVEX PROGRAMMING 



N. S. AYBAT* AND G. lYENGARt 

Abstract. We propose a new first-order augmented Lagrangian algorithm ALCC for solving convex conic programs of the 
form 

min + -f{x) : Ax — b a K,, 2: g x}, 

where p : R" — ^ R U {+00}, 7 ; R" R are closed, convex functions, and 7 has a Lipschitz continuous gradient, A G R™^", 
IC C M™ is a closed convex cone, and x C dom{p) is a "simple" convex compact set such that optimization problems of the 
form min{p(a;) + \\x — '■ x £ x} can be efficiently solved. We show that any limit point of the primal ALCC iterates is 
an optimal solution of the conic convex problem, and the dual ALCC iterates have a unique limit point that is a Karush- 
Kuhn- Tucker (KKT) point of the conic program. We also show that for any e > 0, the primal ALCC iterates are e-feasible and 
e-optimal after C'(log(e~^)) iterations which require solving 0{t~^ log(e~-'^)) problems of the form mina;{/9(z) + |la:: — : x G x}- 

1. Introduction. In this paper we propose an inexact augmented Lagrangian algorithm (ALCC) for 
solving conic convex problems of the form 

(P) : min {p{x) + j{x) : Ax - b e K., x e x}, (1-1) 

where p : M" — > MU{-|-oo}, 7 : M" — >■ E are proper, closed, convex functions, and 7 has a Lipschitz continuous 
gradient V7 with the Lipschitz constant L^, A G R™^", K, C K™ is a nonempty, closed, convex cone, and 
X C dom(p) is a "simple" compact set in the sense that the optimization problems of the form 

min { p{x) + \\x ~ x\\l] (1.2) 

xex 

can be efficiently solved for any x S E". Note that we do not require A e u™x" to satisfy any additional 
regularity properties. For notational convenience, we set 

p{x) := p{x) +j{x). 

In some problems, the compact set x is explicitly present. For example, in a zero-sum game the decision 
X represents a mixed strategy and the set x is a simplex. In others, x may not be explicitly present, but 
one can formulate an equivalent problem where the vector of decision variables can be constrained to lie in 
a bounded feasible set without any loss of generality. For example, if 7 is strongly convex, or if p is a norm 
and 7(-) > 0, then the decision vector x can be restricted to lie in a appropriately defined norm ball centered 
at any feasible solution. 

We assume that the following constraint qualification holds for (P). 

Assumption 1.1. The problem (P) in (jl.ip has a Karush-Kuhn- Tucker (KKT) point, i.e., there exists 
y* G IC* such that goiu*) '-^ inf{p(a;) — (y*. Ax — b): x G x} = p* > — oo, where p* denotes the optimal 
value of (P) and IC* denotes the dual cone corresponding to IC, i.e., IC* := {y G R™ : {y,x) > Vx G IC}. 
Assumption II . II clearlv holds whenever there exists x G relint(x) such that Ax — 6 G int(AC) |4]. 

1.1. Special cases. Many important optimization problems are special cases of (|l.ip . Below, we briefly 
discuss some examples. 

Min-max games with convex loss function: This problem is a generalization of the matrix game 
discussed in QA\. The decision maker can choose from n possible actions. Let x G M" denote a mixed 
strategy over the set of actions, i.e., a; G x := {a; : — ^ 0}. Suppose the mixed strategy x must 

satisfy constraints of the form Ax — b G JC. These constraints could be modeling average cost constraints. 
For example, one may have constraints of the form Ax < b, where A G E™'*" and Aij denotes amount of 
resource i consumed by action j. One may also have constraints that restrict the total probability weight of 
some given subsets of actions. 
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The adversary has p possible actions. The expected loss to decision maker when she chooses the mixed 
strategy x € and the adversary chooses the mixed strategy y e M^* is given by 



p{x) +y'^Cx - (l){y), 

where p is a convex function, and is a strongly convex function. Then the decision maker's optimization 
problem that minimizes the expected worst case loss is given by 

mm{p{x) + j{x) : Ax — b^lC, a; G %} , (1-3) 

where 



7(a;) = max < y^Cx - (j){y) : ^ = 1, y>o\. 

^ t — 1 ^ 



(1.4) 



k=l 

From Danskin's theorem, it follows that V7(a::) — C'^y{x), where y{x) denotes the unique minimizer in (jl.4p 
for a given x. In llj, Nesterov showed that V7 is Lipschitz continuous with Lipschitz constant crmax(C')^/''", 
where t denotes the convexity parameter for the strongly convex function cf). Thus, it follows that the 
minimax optimization problem (jl.3p is a special case of (jl.ll) . 

Problems with semidefinite constraints: Let 5™ denote the set of m x to symmetric matrices, 
and let 5™ denote the closed convex cone of m x to symmetric positive semidefinite matrices. A convex 
optimization problem with a linear matrix inequality constraint is of the form 

n . 

p{x):Y.A,x,+BeSl^y (1.5) 

where p is a convex function, B e 5™, and Aj e 5™ for j — 1, . . . ,n. Convex problems of the form (|1.5p 
can model many applications in engineering, statistics and combinatorial optimization [4j. In most of these 
applications, either the constraints imply that the decision vector x is bounded, or one can often establish 
that the optimal solution lies in a norm-ball. In such cases, (|1.5p is a special case of p.ip . Consider the 



-minimization problem of the form 



B^Sl'\. (1.6) 



j=i 

Suppose a feasible solution for this problem is known. Then (jl.6[) is a special case of (II. ip with p(x) = 
7(-) ~ 0, JC ~ 5™ and x — {^^ R" ■ ll^^lli 1^ ll^^olli}- The main bottleneck step in solving this problem using 
the ALCC algorithm reduces to the "shrinkage" problem of the form min{A||x||i + ||a; — a;||2 : ||a:|li < H^^olli} 
that can be solved very efficiently for any given x G M" and A > 0. 

1.2. Notation. Let S C M™ be a nonempty, closed, convex set. Let ds ■ M™ — > M+ denote the function 

dsix) -.^ mm\\x - x\\2, (1.7) 

i.e., ds{x) denotes the i?2-distance of the vector x G E™ to the set S. Let 

ns(x) := argmin{||a; - x||2 : x € S}, (1.8) 

denote the •^2-projection of the vector a; g M™ onto the set S. Since S C M™ is a nonempty, closed, convex 
set, ns(-) is well defined. Moreover, ds{x) — \\x — ns(a;)||2. 

1.3. New results. The main results of this paper are as follows: 

(a) Every limit point of the sequence of ALCC primal iterates {xk} is an optimal solution of (jl.ip . 

(b) The sequence of ALCC dual iterates {yk} converges to a KKT point of (jl.ip . 
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(c) For all e > 0, the primal ALCC iterates Xk are e-feasible, i.e., Xk € x s-nd dic{Axk — b) < e, and 
e-optimal, i.e., \p{xk) —p*\ < e after at most O (log (e^^)) ALCC iterations that require solving at most 
0(e"^ log(e"^)) problems of the form pT^ . 

Since (jl.ip is a conic convex programming problem, many special cases of (jl.ip can be solved in polynomial 
time, at least in theory, using interior point methods. However, in practice, the interior point methods are not 
able to solve very large instances of (jl.ip because the computational complexity of a matrix factorization step, 
which is essential in these methods, becomes prohibitive. On the other hand, the computational bottleneck 
in the ALCC algorithm is the projection (jl.2l) . In many optimization problems that arise in applications, this 
projection can be solved very efficiently as is the case with noisy compressed sensing and matrix completion 
problems discussed in [2 , and the convex optimization problems with semidefinite constraints discussed 
above. The convergence results above imply that the ALCC algorithm can solve very large instances of 
very efficiently provided the corresponding projection (jl.2p can be solved efficiently. The numerical results 
reported in [1] [2] for a special case of ALCC algorithm provide evidence that our proposed algorithm can be 
scaled to solve very large instances of the conic problem . 

1.4. Previous work. Rockafellar [13 proposed an inexact augmented Lagrangian method to solve 
problems of the form 

p* — min {p{x) : f{x) > 0, x G x}j (1-9) 

where x C K" is a closed convex set, p : — > M U {+00} is a convex function and / : — >■ such that 
each component fi{x) of / = (/i, . . . , /„) is a concave function for i = 1, . . . , m. Rockafellar [T3] defined the 
"penalty" Lagrangian 

C,ix,y):^pix) + l^ (^^fix)) ' - (1.10) 
2 \n / + 2 2/i 



where (•)-|- := max{-,0} and max{-,-} are componentwise operators, and /i is a fixed penalty parameter. 
Rockafellar [1^ established that given j/o G M™, the primal-dual iterates sequences {xk,yk} C x x M™ 
computed according to 

C f_,{x k, y k) < iT^f Cf_t{x,yk) + ak, (l-H) 
Vk+i = {yk + iJ-f{xk))+, (1-12) 

satisfy lim^gz^ p{xk) — P and limsupj.g2_|_ fi^k) < when (jl.9p has a KKT point and the parameter sequence 
{uk} satisfies the summability condition J^kLi Vl^~^ < oo- Martinet [9] later showed that the summability 
condition on parameter sequence is not necessary. However, in both [SJ [13] no iteration complexity 

result was given for the algorithm (|l.lip - (ll.l2p when p was not continuously twice differentiable. 

In this paper we show convergence rate results for an augmented Lagrangian algorithm where we allow 
penalty parameter /Lt to be a non-decreasing positive sequence {^k}- After we had independently established 
these results, which are extensions of our previous results in [2], we became aware of a previous work by 
Rockafellar [14] where he proposed several different variants of the algorithm in (jl.lip - (|1.12p where /i could 
be updated between iterations. Rockafellar [T3| established that for all non-decreasing positive multiplier 
sequences {^k} satisfying the summability condition J2kLi y/l^k ctk < 00, {yk} is bounded and any limit 
point of {xk} is optimal to (|1.9p : moreover, 

inax {/j(xfe)} < -^^^-t^^ — p{xk) -p* < Tr—iak + \\yk\\l)- (1-13) 

1=1, ...,m fik ^fJ-k 

Note that the results in [13] only provide an upper bound on the sub-optimality; no lower bound is provided. 
Since the iterates {xk} are only feasible in the limit, it is possible that p{xk) ^ p* and establishing a lower 
bound on the sub-optimality is critical. Moreover, Rockafellar [14 does not discuss how to compute iterates 
satisfying ()1.11|) and assumes that a black-box oracle produces such iterates; consequently, there are no basic 
operation level complexity bounds in }14) . 
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In this paper, we extend (|1.9p to a conic convex program where f{x) — Ax ~ b, and K. is a closed, convex 
cone. We show that primal ALCC iterates {xk} C x satisfies dic{Axk — b) < 0{^^^) and \p{xk) — P*\ < 
0(/i^^), i.e. we provide both an upper and a lower bound, using an inexact stopping condition that is an 
extension of (II. lip . ALCC algorithm calls an optimal first order method, such as FISTA [3], to compute 
an iterate Xk satisfying a stopping condition similar to (jl.lip . By carefully selecting the sub-optimality 
parameter sequence and the penalty parameter sequence {fik}, we are able to establish a bound on the 
number of generalized projections of the form (jl.2p required to obtain an e-feasible and e-optimal solution 
to (|l.ip . and also provide an operation level complexity bound. 

In |14[ . Rockafellar also provides an iteration complexity result for a different inexact augmented La- 
grangian method. Given a non-increasing sequence {ak} and a non-decreasing sequence {nk} such that 
J^'kLi \fjM!~oik < oo, the infeasiblity and suboptimality can be upper bounded (see l|1.13p ) when the duals 
{Uk} are updated according to (|1.12p and the primal iterates {xk} satisfy 



where <j>k{x) C^^ {x, yk) + ^xi^) + 2^ll^~ ^'^-iHs' ^fj-k defined in (|1.10p and 1^ is the indicator function 
of the closed convex set x- With this new stopping condition, Rockafellar [Tl] was able to establish a lower 
bound p{xk) — p* > ~^{f^k^)- ^ote that the stopping condition (|1.14p is much stronger than (|l.lip - in 
this paper we establish the lower bound using the weaker stopping condition (|l.lip . 

First order methods for minimizing functions with Lipschitz continuous gradients [101 [TT| (and also the 
non-smooth variants [3l[T7]) can only guarantee convergence in function values; therefore, the subgradient 
condition (|1.14p has to be re-stated in terms of function values in order to use a first-order algorithm to 
compute the iterates. This is impossible when the objective function is non-smooth. Therefore, one cannot 
establish operational level complexity results for a method that uses the gradient stopping condition (I1.14p 
with first order methods. Next, consider the case where p is smooth, i.e. p(-) — 0. Suppose x = IR", V7 
is Lipschitz continuous with constant and f{x) ~ Ax — b. Then, it is easy to establish that V0fe is 
also Lipschitz continuous with Lipschitz constant = + fik(T^^g^^{A) + ji^^ — 0{^k)- Since 4>k{xk) — 
infj;^^.! 4>k{x) < ^ implies that || V0fc(a;fc)||2 < y^2L^, in order to ensure ()1.14p one has to set ^ < 20-^ ^ (A) ^- 
Thus, the complexity of computing each iterate Xk satisfying (|1.14p will be significantly higher than the 
complexity of computing Xk satisfying (|l.lip . which is the one used in the ALCC algorithm. Therefore, 
although Rockafellar's method using (|1.14p has the same iteration complexity with ALCC algorithm, the 
operational level complexity of a first-order algorithm based on the gradient stopping criterion ()1.14p will be 
significantly higher than the complexity of the ALCC algorithm where ^ — ak- In summary, Rockafellar [14] 
is only able to show an upper bound on sub-optimality of iterates for the stopping criterion (II. lip that leads 
to an efficient algorithm; whereas the subgradient stopping criterion (jl.l4p that results in a lower bound is 
not practical for a first-order algorithm. 

In [B], Lan, Lu and Monteiro consider problems of the form 



where /C is a closed convex cone. They proposed computing an approximate solution for (jl.151) by min- 
imizing the Euclidean distance to the set of KKT points using Nesterov's accelerated proximal gradient 
algorithm (APG) [10l[ll]. They show that at most O (e"^) iterations of Nesterov's APG algorithm [TOlITT] 
suffice to compute a point whose distance to the set of KKT points is at most e > 0. In [8 , Lan and Monteiro 
proposed a first-order penalty method to solve the following more general problem 



where 7 is a convex function with Lipschitz continuous gradient, /C is a closed, convex cone, x is a simple 
convex compact set and A G M"*^". In order to solve (|1.16l) . they used Nesterov's APG algorithm on the 
perturbed penalty problem 




(1.14) 



min{(c, a;) : Ax — b,x£ /C}, 



(1.15) 



min{7(x) : Ax — b £ IC, x £ x}i 



(1.16) 
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where xq G x, dK is as defined in (|1.7p . and ^ > 0, ^ > are fixed perturbation and penalty parameters. 
They showed that Nesterov's APG algorithm can compute a primal-dual solution {x,y) € x ^ ^* satisfying 
e-perturbed KKT conditions 

{y, U^iAi-b)) ^0, d^iAS:~b)<e, Vjix)-A'^ye-M^ii) + Bie), (1.17) 

using O (e^^ log (e^^)) projections onto K. and x, where :— {s g R" : {s,x — i) < 0, Va; G x} and 

;B(e) {x E M" : ||a:|j2 < e}. Note that since ^ and /j, are fixed, additional iterations of the Nesterov's APG 
algorithm will not improve the quality of the solution. 

The optimization problem (|1.16p is a special case of (jl.ip with p(-) — 0. Thus, ALCC can solve (|1.16p . 
We show that every limit point of the ALCC iterates are optimal for (|1.16p . Furthermore, for any e > 0, 
ALCC iterates are e-optimal, and e-feasible for (|1.16p within O (e^^ log (e^^)) projections onto K, and x as 
is the case with the algorithm proposed in [8 . 

Lan and Monteiro [7| proposed an inexact augmented Lagrangian method to solve a special case of (jl.ip 
with K, = {0} and p(-) = 0; and showed that Nesterov's APG algorithm can compute a primal-dual solution 

(i, y) e X X H^™ satisfying (|1.17p using O ^e~^ (log (e~^)) " log log (e~^)^ projections onto x and /C. 

Aybat and Iyengar [2| proposed an inexact augmented Lanrangian algorithm (FALC) to solve the com- 
posite norm minimization problem 

min {fii\\aiTiX)~G)\\c.+fi2\\CiX)~d\\i3+j{X): AiX)--bEQ}, (1.18) 

where the function a{-) returns the singular values of its argument; a and /3 € {l,2,oo}; A,C,T are linear 
operators such that either C or is injective, and A is surjective; 7 is a convex function with a Lipschitz 
continuous gradient and Q is a closed convex set. It was shown that any limit point of the FALC iterates 
is an optimal solution of the composite norm minimization problem (jl.l8p : and for all e > 0, the FALC 
iterates are e-feasible and e-optimal after O (log(e^^)) FALC iterations, which require O (e^^) shrinkage 
type operations and Euclidean projection onto the set Q. The limitation of FALC is that it requires A to 
be a surjective mapping. Consider a feasible set of the form 

{x e R" : Aix - 61 G /Ci, A2X - 62 e /C2, xe x}, (1-19) 

where JCi is a closed convex cone, Ai g ]^"iixn ^^^^^ ^. ^ ^rni j^j, i = 1^2. The set in (|1.19p can be reformulated 

as the feasible set in (jl.ip by choosing ^ ~ ^ ^ and JC ~ JCi x JC2, where m — mi + m2- FALC can 

work with such a set only if A has linearly independent rows, i.e., rank(^) = mi 4- m2- This is a severe 
limitation for the practical problem. On the other hand, the ALCC algorithm works for the feasible sets of 
the form (|1.19l) without any additional assumption. Thus, ALCC can be used to solve much larger class of 
optimization problems. 

In our opinion the ALCC algorithm proposed in this paper unifies all the previous work on fast first-order 
penalty and/or augmented Lagrangian algorithms for solving optimization problems that are special cases of 
p.ip . We do not impose any regularity conditions on the constraint matrix A and the projection step (|1.2p 
is the natural extension of the gradient projection step. We believe that this unified treatment will spur 
further research in understanding the limits of performance of the first order algorithms for general conic 
problems. 

2. Preliminaries. In Section I^TTl first we briefly discuss a variant of Nesterov's APG algorithm [TIIIITTI 
to solve (|l.ip without conic constraints. Next, we introduce a dual function for the conic problem in (11.11) and 
establish some of its properties in Section 12.21 The definitions and the results of Section 12.21 are extensions 
of the corresponding definitions and results in [T^l IS] , to the case where JC C R™ is a general closed, convex 
cone. 

2.1. Accelerated Proximal Gradient (APG) algorithm. In this section we state and briefly 
discuss the details of a particular implementation of Fast Iterative Shrinkage- Thresholding Algorithm [3] 
(FISTA) , which extends Nesterov's accelerated proximal gradient algorithm [lOl [11] for minimizing smooth 
convex functions over simple convex sets, to solve non-smooth convex minimization problems. 
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Algorithm APG(/9, 7, x, a;o, stop) 



1: Xq"""' <— a;o, Xj^' ^ xo, ti ^ 1, £ 
2: 
3: 
4: 



while STOP is false do 
a;^""^' -t— argmin 



{p(x) + ( V7 (4'^) , ^ - ) + ^ 11^ - > Hi : ^ G x} 
t^+i ^ (1 + \/l + 4 tf) /2 



61 £-\-\ ^ 

7: end while 



Fig. 2.1: Accelerated Proximal Gradient Algorithm 

FISTA computes an e-optimal solution to min{p(a::) + 7(2;) : x e M"} in O (^^^^^ iterations, where 
p : M" — R and 7 : R" — )■ R are continuous convex functions such that V7 is Lipschitz continuous on R" 
with constant L^. Tseng [17] showed that this rate result for FISTA also holds when p : R" — )■ (—00, +00] 
and 7 : R" — !■ (—00, +00] are proper, lower semicontinuous, and convex functions such that domp is closed 
and V7 is Lipschitz continuous on R". 

This extended version of FISTA is displayed in Figure [23] as APG algorithm. Hence, FISTA can solve 
constrained problems of the form 

min{p(2;) + 7(x) : x e x}, (2-1) 

where x C R" is a simple closed convex set. 

The APG algorithm displayed in Figure 12.11 takes as input the functions p and 7, the simple closed 
convex set x C R", an initial iterate x'^'^^ e x and a stopping criterion stop. Lemma [2.11 gives the iteration 
complexity of the APG algorithm. 

Lemma 2.1. Let p and j be a proper, closed, convex functions such that domp is closed and V7 is 
Lipschitz continuous on R" with constant L^. Fix e > and let {x'^p ^x'^p} denote the sequence of iterates 
computed by the APG algorithm when stop is disabled. Then p(x^^'*) +7(a;^^'*) < min{/?(x)+7(a;) : x G x}+£ 

whenever I > \J \\x* — a^olU — 1; where x* G argmin{p(x) + 7(0;) : x G x}- 
Proof. See Corollary 3 in [17] and Theorem 4.4 in [3] for the details of proof.D 

2.2. A dual function for conic convex programs and its properties. For all p > 0, optimization 
problem (P) in (jOJ is equivalent to 

min |p(x) + ^\\Ax - s- b\\j : Ax ~ s ^ b, x e x, s G /c| . (2.2) 
Let y G R™ denote a Lagrangian dual variable corresponding to the equality constraint in (|2.2p . and let 

£^{x, y) min {p[x) -{y,Ax-s-b) + ^\\Ax - s - b\\i\ (2.3) 
denote the "penalty" Lagrangian function for (|2.2p with dom£^, = X ^ . For /i > 0, 



^t^{x,y) ^p{x) + ^ mill 



Ax — s — b — — 



'_\\yg 
2 



where dic{-) is the distance function defined in (|1.7|) . When p — 0, the definition in (12. 3p implies that 

£nfx = I y^'^*' (2 5) 

^ '^^'^ I -00, otherwise. ^ ' ^ 
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For fi> 0, we define a dual function : — > R for (|l.ip sucli that 

5M(y) := inf £p(a;,y). (2.6) 

Note that from (|2.5p it follows that go is the Lagrangian dual function of (P). 

The definitions above and the results detailed below are immediate extensions of corresponding defini- 
tions and results in [12], given for K, = R™, to the case where /C is a general closed convex cone. We state 
and prove the extensions here for the sake of completeness. These results are used in Section [3] to establish 
the convergence properties of ALCC iterate sequence. 

Lemma 2.2. For all fi > 0, x E x '^i^d U G H^™? defined in (|2.3p satisfies 

C^{x,y)^ ini{F^{x,u)-{y,u)}, (2.7) 
where \ xx W'^ — > M U {+cxd} is defined as follows 

f^^(:z:,u):^/ ^f Ax ~ b e IC + 

^ I +00, otherwise. 

Hence, L^{x, y) is convex in x G x concave in y € R™, and g^{y) defined in (|2.6p is concave iny £ R"*. 

Proof. The representation in (|2.7p trivially follows from the definition of in (|2.8p . For a fixed a; S Xj 
(|2.3p implies that C^{x,y) is the infimum of affine functions of y, hence C^(x,y) is concave in y. Hence, 
defined in (|2.6p is the infimum of concave functions; therefore, it is also concave. For a fixed y £ R™, when 
/i > 0, convexity of Cfj,(x,y) in a; follows from (12. 4p and the fact that p(-) and dic(-) are convex functions; 
otherwise, when /i = 0, it trivially follows from ()2.5p . □ 

Lemma 2.3. Lei g : R™ — >■ R U {+00} be a proper closed convex function. For /i > 0, let 

^pf,{y) ^ min |5r(z) + -!-||z-y||^|, 7r,,(y) = argmin (5(2) + - 
denote the Moreau regularization of and the proximal map corresponding to g, respectively. Then, for all 

h,{yi) - ^^(2/2)111 + \K{yi) - ^l{y2)\\l < hi ~ y2\\l (2.9) 

where tt'^Xv) V ^ '^fj.i'U) fof o-ll z G M™. Moreover, ip^ : M™ — )• R is an everywhere finite, dijjerentiable 
convex function such that 

^Mv) = -iy- - - ^^(2/), (2-10) 

is Lipschitz continuous with constant i. 

Proof. The proof of (12. 9p is given in |15j and the rest of the claims including (|2.10l) are shown in [5] . □ 
Theorem 2.4. Suppose Assumvtion holds. Then, for any /i > 0, is an everywhere finite, 

continuously differentiate concave function and g^ achieves its maximum value at any KKT point. Moreover, 

^^^^^ " ?GR™ {'^°^'''' ~ 2^""' " ' (2-11) 



^9^{y)^--{y-^^^.{y)), (2.12) 

is Lipschitz continuous with Lipschitz constant equal to ^, where 7r^(2/) S K,* denotes the unique maximizer 
in ([2TT|) . 
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Proof. Fix /i > 0, define 



h^{u) := M F^{x,u). (2.13) 

xex 



Note that F^(a;,w) = p{x) + f Ijujlj + lici^x — b — u), where 1a:(-) denotes the indicator function of the 
set /C; therefore, F^{x,u) is convex in {x,u). Since is convex in (x, u), x is a convex set and /ip(0) = 
inf3;g^{p(a;) + \]c{Ax — 6)} — p* > — oo, it fohows that is a convex function such that h^{-) > —oo [4]. 
From the definition of Ff^, it foUows that for ah u G W\ 

h^(u) = /io(m) + /X w(m), 

where oj{u) := 5||m||2- Substituting (12. 7p in (|2.6p . for ah /i > 0, we get 

5m (y) = - (2^' ">} = ~K^V)^ 

where /i* denotes the conjugate of the convex function h^. 

Fix /i > 0, since ft,^ is a sum of two convex functions, it fohows from Theorem 16.4 in |16j that 

9,{y) = -{ho + f^ujyiy) = - mm l^h*{z) + ^i co* (^^^ | . (2.14) 

Since Hq — —go and uj* — uj, the resuh (|2.1ip immediately foUows from (|2.14l) . 

Note that (j2.1ip shows that —5^ is the Moreau regularization of —go- Therefore, Lemma [2731 and (|2.1ip 
imply that g^ is everywhere finite, differentiable concave function such that Vg^ is given in (j2.12p . 

Let y* be a KKT point of (jl.ip . Note that 7r^(y*) — y* ■ Hence V.g^(y*) — 0. Concavity of g^ imphes 
that y* e argniax (7^(y) for any KKT point y*. □ 

Theorem 2.5. Fix fi > and y G M™. Suppose a; G x is an ^-optimal solution to min^jg^ Lf^(x, y), i.e. 
Lt,{x, y) < min{L^(a;, y) : x e x} + £. = 9t^{y) + t Then 

/i \\VyC^ix,y)-Vg^{y)\\l<2^- (2-15) 

Proof. For jj, > 0, g^ is concave and V^^ is Lipschitz continuous with Lipschitz constant equal to i; 
therefore, 

> 9t.{y) + (V5^(y),y-y) - ~ y\\l (2.16) 
for all y £ K™. Moreover, since for every a; e x, Cfj,{x, y) is concave in y, it follows that for all y e R™ 

y) + {'^y^t^i^^ y)' y-y) > ^t^i^, y) > 9t,{y)- (2.i7) 

Combining (j2.16p . p.l7p and the fact that x is ^-optimal and y is arbitrary, we get 

C> sup ((V5^(y)- Vj,£^(a;,y), 2/ - y) - b - y|| A = ^|| Vg^(y) - Vy/:^(x, y)||l. 



□ 
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3. ALCC Algorithm. In order to solve (P) given in (jl.ip . we inexactly solve the sequence of sub- 
problems: 

{SPk) : mmPk{x,yk), (3.1) 

where 
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— ^Mfc (2^' y) ^ — P{x) + \ [Ax -b- — 



Algorithm ALCC (a;o, {ak, -gk, jJ-k)) 



yi -tr- 0, k i~ 1 
while fc > 1 do 

Xk <- ORACLE(Pfc,?/fe,Q:fc,77fe,/ifc) /* See Section [3l\ for Oracle */ 
Vk+i ^ Mfe [n^ {ax, - b - ^) - {Ax, -b-^)] 
fc ^ fc + 1 

end while 



Fig. 3.1: Augmented Lagrangian Algorithm for Conic Convex Programming 
For notational convenience, we define 

fk{x,y) -.^l: d^iAx-b- — ] . 

Therefore, Pk{x,y) = ^ p{x) + fk{x,y). The specific choice of penalty parameter and Lagrangian dual 
sequences, {/i/t} and {yk}, are discussed later in this section. 

Lemma 3.1. For all k > 1 and y € R™, fk{x,y) is convex in x. Moreover, 

V./fe(x, y)=A^ (Ax-h-^-Il,c(Ax-h-^\\, (3.2) 



and 'Vxfk{x,y) is Lipschitz continuous in x with constant L — a'^g^^{A). 
Proof. See appendix for the proof. □ 

The ALCC algorithm is displayed in Figure [3TTJ The inputs to ALCC are an initial point xq £ x and a 
parameter sequence {ak, rjk, ^ik} such that 

ttfc \ 0, Vk\0, < fJ'k oo. (3.3) 

3.1. Oracle. The subroutine Oracle(P, y, a, t;, /i) returns x E x such that x satisfies one of the fol- 
lowing two conditions: 

0<P(x,y)- inf P(x,y) < (3.4) 

xex pi 

3qedxP{x,y)+dxl^{x) s.t. IIqIU < -, (3.5) 

where Ix(') denotes the indicator function of the set x- 

Let pk{x) :— p{x) and "fk{x) ■= l{x) + fk{x,yk)- Then Wjk exists and is Lipschitz continuous 
with Lipschitz constant 



L^, ■.= —L^ + ctI,,{A). (3.6) 



Let 



X ^ Xfe argminPfc(x,?/fc) (3.7) 

denote the set of optimal solutions to {SPk). Then, Lemma [2.11 guarantees that the APG algorithm with 
the initial iterate Xk-i G x requires at most 



2fJ,kL^^ 



Cax(fc) := \ -^^-^ d^. (xfc_i) (3.8) 
V ctk 

iterations to compute ^-optimal solution to the fc-th subproblem {SPk) in (|3.ip . Thus, setting the stopping 
criterion STOP = {I > ^max(fc)} ensures that the output of the APG algorithm satisfies p.4[) . Thus, we 
have shown that there exists a subroutine ORACLE^Pk, yk,C(k, rjk , t^k) that can compute Xk satisfying either 
p.4p or (|3.5p . As indicated earlier, the computational complexity of each iteration in the APG algorithm is 
dominated by the complexity of computing the solution to (|1.2p . 



3.2. Convergence properties of ALCC algorithm. In this section we investigate tlie convergence 
rate of ALCC algoritlim. 

Lemma 3.2. Let K, C M" denote a closed, convex cone and x G M". Then x — Htc{x) G — /C* and 
{x - n^^ix), UK^ix)) = 0, where /C* = {s e M" : {s,x) > Vx e /C}. Finally, if x £ -IC*, then 0^(2;) = 0. 
Proof. See appendix for tire proof. □ 

From Lemma lX^ it follows that the dual variable yk+i computed in Line 2] of ALCC algorithm satisfies 
Vk+i G A^*- Also note that for all fc > 1, 

Vk+i = yk + fJ.k^yCf,^{xk,yk)- (3.9) 

Next, we establish that the sequence of dual variables {yk} generated by ALCC algorithm is bounded 
for an appropriately chosen parameter sequence. 

Lemma 3.3. Let {xk,yk} G X ^ ^* sequence of primal- dual ALCC iterates for a given input 

parameter sequence {a^, Hk} satisfying (|3.3p . Then, for all k > 1, 

< Cf,^{xk,yk) - gt,Ayk) < £,k, (3.10) 

where 

^fc = max{Q!fc, ?7fc (ix.(xfe)}, (3.11) 

and C X defined in p.7p . 

Proof. Fix fc > 1. Suppose Xk = ORACLE{Pk, yk, (Xkjilk, fJ-k) satisfies p.4p . Then we have 

Pk{xk,yk) < inf Pk{x, yk) + -^ ^^"'^ + ""^ ■ (3.12) 
^^x flk Mfe 

Suppose instead that x^. = Okacle{Pi,, yk, ctk, Vk, l^k) satisfies p.Sp . Then, there exists qk G dxPk(xk,yk) + 
d\y_{xk) such that ||(Zfc||2 ^ Since Pk{x,yk) + lx(2^) convex in x, it follows that 

Pk{Xk,yk) < Pk[x,yk) + {qk,Xk - X) < . (3.13) 

xGxl l^k 

Since Pk[x,y) = -^C^^^{x,y), the desired result follows from (|3.12l) and (|3.13p . □ 

The following result was originally established in [T3| for JC — M™. We state and prove the extension to 
general convex cones for completeness. 

Theorem 3.4. Suppose B :— \/2 CfcMfc < 00, where is defined in (13.111) . Then, for all k >\, 

\\yk\\2 < B -\- \\y*\\2 where y* is any KKT point of (P). 

Proof Lemma [33] and Theorem [2?5] imply that y/2 ^kfJ-k > Wfik^ yC^,^{xk,yk) - l^k^ giik{yk)\\2- Next, 
adding and subtracting yk, and using p.l2p and p.9p . we get 

\/2"6Aife > \\li'k^yC^^{xk,yk) + yk - {yk + ^lk'^g^iAyk))\\2 = bfe+i - 7rp;„(j;fc)l|2, (3.14) 

Since I]fcli V^TkJj^ < 00, it follows that ^fc/Xfc ^- 0. Thus, lim/cgz+ (j/fc+i - T^iikiyk)) = 0. 

Assumption 11.11 guarantees that a KKT point y* e /C* exists. Since y* G argmax^^gg™ go{y), Theorem l2.4l 
implies that y* G argmax^^g^m g^^^ (y) for aU fc > 1. Therefore, ^gfi^iy*) = 0, and consequently, by p.l2p . 
y* = T^fMkiy*)- Since tt^^ is non-expansive, it follows that 

lk,.,.(yfc) -2/II2 = Ik^fc(yfc) - 7r,.,(y*)||2 < \\yk-y*\\2- 

Hence, 

l|yfc+i - 2/II2 < hk+i - 7r^fc(2/fc)ll2 + ht^Avk) - y*\\2, 

< hk+i - 7r^fc(yfc)ll2 + hk - y*\\2, 

< y/2^i^+\\yk-y*\\2. (3.15) 
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Since yi ~ 0, the desired result is obtained by summing the above inequahty over k. □ 

In the rest of this section we investigate the convergence properties of ALCC for the multipher sequence 
{afc,??fc,Mfe} defined as foUows 

1 1 

^lk = ^3''^io, "fc = p(i+e) ^fe = fc2(i+c) ^fc (3.16) 

for all fc > 1, where /3 > 1, c,ao,r]o and /io are all strictly positive. Thus, \, 0, 77^ \, and /i^ 00. 
Let 00 > := max^^g^ maxa;'g^ ||a; — x'|j2 denote the diameter of the compact set x- Clearly, d^'^{xk) < 
for all fc > 1, where C x is defined in p.7p . Hence, from the definition of ^k in (13. lip , it follows that 



vCfcMfc < prp^ YAioniax{ao, yyoAp^}, Vfc > 1, (3.17) 

and X^fc^i \/S,kfJ-k < 00 as required by Theorem 13.41 First, we lower bound the sub-optimality as a function 
of primal infeasibility of the iterates. 

Theorem 3.5. Let {xk,yk} € X ^ ^* sequence of primal- dual ALCC iterates corresponding to a 

parameter sequence {ctk^Vk, l''k} satisfying p.3p . Then 

p{xk) -p* > -||y*||2 d/c [Axk - 6 - — ) + — {yk,v*) , 

where y* e /C* denotes any KKT point of (P) and p* denotes the optimal value of (P) given in (jl.ip . 

Proof The dual function go{y) — —00 when y ^ K.*; and for all y £ JC* , the dual function 50 of (P) can 
be equivalently written as 

50 (y) = (^y) + {p(^) + - , 

= {b,y)-{p + l^nA^y). 

Hence, the dual of (P) is 

{D): max(6,y)-(p+l^)*(A^y). (3.18) 

Any KKT point y* e /C* is an optimal solution of p.lSp . Let 6^ := 6 + ^ for aU fc > 1. For k > 0, define 

(Pfe) : min{p(a;) + K dK:(^a; - fefe)} , 

= min {p{x) + l^{x) + n \\Ax - bk - s\\2} , 

= max min ip(x) + 1y(x) + (w, Ax — bk — s)} , 



max 

||u;||2<K 



- {bk,w) + inf (-W, s) - sup {{-A'^w,x) - {p{x) + 



Since iniseK {^w, s) > — cxd, only if — w G JC*; by setting y = —w, we obtain the following dual problem (Pfc) 
of (Pfc): 

{Dk): max {(6^, y) - (p + l^)*(^^y)} . 

Since ?/* G K,* is feasible to (2?fe) for k = ||y*||2, and Xfe G x is feasible to (Pk), weak duality implies that 

P{xk) + \\y*h d,c{Axk-bk) > {b,y*) - (p + 1J*(^V) + — {yk,y*) =p* + — {yk,y*) , 

where the equality follows from strong duality between (P) and {D). □ 
Next, we upper bound the suboptimality. 
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Theorem 3.6. Let {xk,yk} £ x ^ ^* the sequence of primal- dual ALCC iterates corresponding to a 
parameter sequence {ak,rikT fJ-k} satisfying p.Sp . Let p* denote the optimal value of {P). Then 

Pkixk.Vk) - — P* < — + ^ WvkWl (3.19) 

where = max{ak,r]k d^t{xk)} and x* denote the set of optimal solutions to (P). 

Proof Fix fc > 1 and let x* G x* ■ Suppose that Xk = ORACLE{Pk, yk, a k, V k, IJ-k) satisfies p.4p . Then, 
since G x, from p.l2p . it follows that 

Pk{xk,yk) < M Pk{x,yk) + — <Pk{x*,yk) + —- (3.20) 

Next, suppose that Xk = ORAGLE{Pk, yk, ak, Vk, fJ-k) satisfies p.Sp . Then, since Pk{x,yk) + ^x(^) convex 
in x for all fc > 1, it follows that 

Pk{xk,yk) < Pk{x\yk) + {qk,xk - X*) < Pk{x*,yk) + '^'^ (3.21) 

From ([5:^ and ((X^ . it follows that 

\ ^ *^^^ {a* h ykV , ma,x{ak, Vk \\xk - x*\\2} , . 

Pk[Xk,yk) P <-:;"-k\Ax -b H . (3.22) 

Mfe 2 V ^kj Aife 

Since Ax* — 6 £ /C, Lemma 1X2] implies that djc [Ax* — b— ^] < Moreover, since x* e y* is 

arbitrary, from (|3.22p it follows that 

P,(^,,y,) .±p*<MM + maxja,, inf..,,, \\xk - x*h} _ ^^ ^S) 

Mfe 2^fc Ilk 

□ 

Note that since /fe(-) > 0, we have Pk{xk,yk) > p{xk) for all fc > 1. Hence, 

p{xk)~P* <Ck + ,^\\yk\\l (3.24) 

Now, we establish a bound on the infeasibility of the primal ALCC iterate sequence. 

Theorem 3.7. Let {xk, yk} S X x ^* denote the sequence of primal- dual ALCC iterates for a parameter 
sequence {ak,r]k, fJ-k} satisfying (j3.3|) and y* G /C* be a KKT point of (P). Then 

< d^ {Axk -b)< Wy^^h + Wy^+^-y^^h 

fJ-k 

for all k> I, where = 'max{ak,r]k d^'^Xk)} and x* denote the set of optimal solutions to (P). 
Proof. From Step |4] in ALCC algorithm, it follows that 

y^+^-y^-u^ (Axk-b-y^)~iAxk-b), 



fj-k \ fj-k 



^IlK[Axk-b-^] - IlK{Axk -b) + nn:{Axk - b) - {Axk - b). 



Hence, 



n,c{Axk-b~^]~ UiciAxk - b) 

l^k 



The result now follows from the fact that Hyc is non-expansive. □ 
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In the next theorem we establish the convergence rate of ALCC algorithm. 

Theorem 3.8. Let {xk, Uk} G X x ^* denote the sequence of primal- dual ALCC iterates for a parameter 
sequence {ak,r]k, ^J-k} satisfying (|3.16p . Then for all e > 0, djc{Axk ~ b) < e and \p{xk) — P*\ < e within 

(log Oracle calls, which require solving at most O (e^^ log (e^^)) problems of the form (jl.2l) . 

Proof. To simplify the notation, let ao = r]o = fiQ = 1, and, without loss of generality, assume that 

1 < 2?, where V := maxx^x d-x'i^) ^ < Then, clearly d^'ixk) < 2? for all fc > 1. 

First, p.25p implies that 



dK{Axk -b)<^ iWvkh + WVk+i - Vkh) ■ (3.26) 



Moreover, from Step U] of ALCC algorithm, it follows that 

djAxk-b-y^)< ^ ^Wvk^ih. (3.27) 
Now, Theorem [231 and ((X?f| together imply that 

\pixk)-p*\ < ^ max|b*||2 (bfe+ilh + hkh) , + ^| (3.28) 
Theorem 13.41 shows that {yk} is a bounded sequence. Hence, from p. 261) and (|3.28p . we have 



d,c{Axk^b) = 0{—], \p(^,)-p*\=o[-j:]. (3.29) 



Hence, p.29p implies that for all e > 0, an e-optimal and e-feasible solution to (P) can be computed within 
O (log (e~^)) iterations of ALCC algorithm. 

The values of Lj,. , ak and fik are given respectively in p.6p and p.l6p . Substituting them in the 
expression for dax(fc) in (|3.8p and using the fact that dx'{xk~i) < A^^, we obtain 



(A) dxi {xk-i) /3'=fc'+^ = O (/3'=A;i+=) . (3.30) 



Hence, (I3.30p imply that at most O {e~^ log(e^^)) problems of the form (|1.2p are solved during O (log (e^^)) 
iterations of ALCC algorithm. Indeed, let e Z+ denote total number of problems of the form (|1.2p solved 
to compute an e-optimal and e-feasible solution to (P). From p.29p and p.30p . it follows that there exists 
ci > and C2 > such that 




B-l \ € 

k=l k=l 



□ 

Corollary 3.9. Let {xk,yk} & x ^ ^* denote the sequence of primal- dual ALCC iterates for a 
parameter sequence {ak, rjk, /ife} satisfying p. 161) . Then limfcgz+ p{xk) = P* and lim^gz^ dic{Axk — 6) = 0. 
Moreover, for all S C Z+ such that x — lim^gg Xk, x is an optimal solution to (P). 

Proof. Since x is compact, Bolzano- Weierstrass theorem implies that there exists a subsequence S C Z+ 
such that X = lim^g^ Xfc exists. Moreover, taking the limit of both sides of p. 261) and p. 281) . we have 
limfcgz^ diciAxk - b) = and limfcgz+ pixk) = p* ■ Hence, limfcgg dic{Axk -b) =Q and \\v[ikesp{xk) = p* . U 
Note that even though p{xk) p* , the primal iterates themselves may not converge. 

Rockafellar [T3] proved that the dual iterate sequence {yk} computed via (|l.lip - (|1.12p . converges to a 
KKT point of (|1.9p . We want to extend this result to the case where /C is a general convex cone. The proof 
in [13] uses the fact that the penalty multiplier ^ is fixed in (|l.lip - (|1.12p and it is not immediately clear how 
to extend this result to the setting with {/i/t} such that ^k 00. In Theorem 13. 101 we extend Rockafellar's 
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result in [T3] to arbitrary convex cones JC when f{x) — Ax — h and the penalty multipliers fXk oo. After 
we independently proved Theorem 13.101 wc became aware of an earlier work of Rockafellar |14j where he 
also extends the dual convergence result in ^15] to the setting where {/ife} is an increasing sequence. See 
Section [L^ for a detailed discussion of our contribution in relation to this earlier work by Rockafellar. 

Theorem 3.10. Let {xk,yk} G X x K^* denote the sequence of primal- dual ALCC iterates corresponding 
to a parameter sequence {ak,r]k, ^J-k} satisfying (|3.16p . Then y := limfcgz+ Vk exists and y is a KKT point of 
(P) m (fLD) . 

Proof. It follows from p.l4p that for all fc > 1 we have 



lim \\yk+i ~^r^,^{yk)h < V2|fcA*fe = 0, (3.31) 

where £,k is defined in p. lip . Moreover, Theorem 13.41 shows that {yk} is a bounded sequence. Hence, (I3.3ip 
implies that {T^fikiVk)} is also a bounded sequence. 

From p. lip , it follows that ff^^ (yfc ) = 5o (tT;,;, (i/fc ) ) - 5^ 1 1 tt^, (yt ) - 2/fe 1 1 1 and g^, (yt ) > .go (y * ) - 2^ 1 1 y * - 
yk\\2 for any KKT point y*. Since go{y*) — p* , we have that 

5o(7r^Jyfc)) >P*-^ Ily*-yfc|l2- (3-32) 

Since {yk] is bounded, taking the limit inferior of both sides of p.32p we obtain 

liminf go(7r;.,(yfc)) > P* - lim ||y* - yk\\l ^ p* . (3.33) 

Moreover, since 7r^i.(yfc) € /C* for all k > 1, weak duality implies that limsup^^g^^ Qoi'^i-ikiyk)) l£ P* ■ Thus, 
using p.33p . we have that 

lim gain f,,{yk)) = p*. (3.34) 

Since {7r^fc(yfc)} is bounded, there exists S C 2+ and y e /C* such that 

y := limTT^Jyfc) = limyfe+i, (3.35) 

rC t O fC t O 

where the last equality follows from p. 311) . 
From (IZ3t and (I2T61). it follows that 



5o(y) = inf {p(a;) - (y, Ax - s - 5) } . 

Hence, —50 is a pointwise supremum of linear functions, which are always closed. Lemma 3.1.11 in |10j 
establishes that —go is a closed convex function. Since a closed convex function is always lower semicontinous, 
we can conclude that —go is lower semicontinuous, or equivalently, go is an upper semicontinuous function. 
Hence, and imply that 

P* = lim 5o(7rp,(yfe)) = limsup go(7rp, (yfc)) < 5o(y) < P* , 

where the first inequality is due to upper semicontinuity of go and the last one is due to weak duality and 
the fact that y G /C*. Thus, we have 

5o(y) = lim yo(7i-^, (yk)) = P* , (3.36) 

which implies that y G /C* is a KKT point of ()l.ip . 

Moreover, since p.lSp holds for any KKT point, we can substitute y for y* in the expression. Thus, we 
have 

||y^-y||2 < llyfe-y||2 + ^v/26^, V£ > fc. (3.37) 

t>k 
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Fix e > 0. Since the sequence { VCfe Mfe } is summable, it follows that there exists A^i € Z+ such that 
St^fe VWttH < f for all k > Ni. Moreover, since the {yk}kes converges to y, it follows that there exists 
N2 & S such that N2 > A^i and \\yN2 - vh < f ■ Hence, P371) implies that \\yi - y\\2 < e for ah £ > TVa. 
Therefore, hmfcgz_|_ = □ 

4. Conclusion. In this paper we build on previously known augmented Lagrangian algorithms for 
convex problems with standard inequality constraints [HI [TJ] to develop the ALCC algorithm that solves 
convex problems with conic constraints. In each iteration of the ALCC algorithm, a sequence of "penalty" 
Lagrangians — see (|2.4p — are inexactly minimized over a "simple" closed convex set. We show that recent 
results on optimal first-order algorithms [3j [17] (see also [lOj [11]), can be used to bound the number of 
basic operations needed in each iteration to inexactly minimize the "penalty" Lagrangian sub-problem. By 
carefully controlling the growth of the penalty parameter fi^ that controls the iteration complexity of ALCC 
algorithm, and the decay of parameter ak that controls the suboptimality of each sub-problem, we show 
that ALCC algorithm is a theoretically efficient first-order, inexact augmented Lagrangian algorithm for 
structured non-smooth conic convex programming. 
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Appendix A. Proofs of technical results. 

Lemma A.l. Let /(•) = irf^(-). Then f is convex, and ^f{y) — y — H/ciy) is Lipschitz continuous with 
Lipschitz constant equal to 1. Moreover, both Titci') '^'^'^ n^(z) = z — Tlic{z) are nonexpansive. 

Proof. The indicator function of a closed convex set /C is a proper closed convex function, and 



/(y) = ™n + ^11^ - 2/II2} = min hz-y 



is the Moreau regularization of the function lic{-), and the projection operator Ilic{-) is the corresponding 
Moreau proximal map. Therefore, all the results of this lemma follow from Lemma 12.31 □ 
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Lemma A. 2. For all y,y' G R", d^{y) < djciv + v') + Wv'h- 
Proof. 

dK{y) = \\^k{v) - vh = \\^K{y) ~y + ^K{y + y') -U^iy + y') + y' ~ y' + y - yh, 
< \\n,ciy + y') -{y + y')h + Wkiv + v') - ^l{y)h, 
<dK{y + y') + \\y'h, 

where the last inequahty follows from the fact that TV^{x) = x — Il]c{x) is nonexpansive. □ 
Proof of Lemma 13. 11 

Proof. For all y E M™, the convexity of fk{x,y) in x follows from Lemma [A. II 

Moreover, Lemma [A. II and the chain rule, together imply p.2p . Now, fix x' , x" E R" and y E R™. Then 
p.2p implies that 



\^xfk{x,y) - Va:fk{x",y)h 



Ax' -b- ^ -n,r { Ax' -b- ^] - { Ax" - 6 - ^ - Hr f Ax" - 6 - ^ 



y 



i^k 



y 



A*fc 



y 



< a,^.M)\\A{x' - x")h < \\x' - x"\[ 



where the first inequality follows from the non-expansiveness of n^(-). □ 
Proof of Lemma 13.21 

Proof. Hjcix) E argmin^gy;^ ||s — x||2, if, and only if, {IIk:{x) — x, s — 11^(2;)) > for all s E K.. Hence, 



{n,c{x) -X, s)> {Uicix) - X, U,c{x)) , Vs e /C. 



(A.l) 



Since the left hand side of (jA.ip is bounded from below for all s € /C, it follows that Ilic{x) — x E JC* 
Moreover, since Iiic{x) G /C, we have 

= min {TIk.{x) - x, s) > {nK.{x) - x, UK.ix)) > 0. 

This implies {Ilic{x) — x, Ilic{x)) = 0. 

Suppose X E — /C. Clearly, (0 — x, s — 0) > for all s E JC. Thus, it follows that Iiic{x) = 0. □ 
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